# README: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning

Ranjit Lall and Thomas Robinson

Last edit: October 2020

## Contents:
1. main_replication.R -- single file to replicate all results reported in the main paper.

2. replication_guide.md -- further guidance on reproducing the analysis and underlying simulation data in this article

3. package_dependencies.R -- Script for setting up R packages on server instance

4. makefile.sh -- shell script for simulations

5. MIDAS-master.zip -- copy of MIDAS source code used during simulations (for reference)

3. **adult/** - simulation scripts and results files for tests using adult data

4. **application/** - simulation scripts and results for tests using CCES data

5. **data/** - raw data files used in simulations/replication

5. **figures** - image files of figures in main article and appendix

7. **kropko** - simulation scripts and results for tests based on Kropko (2014) simulations

*Please see the replication guide for more detailed description of the contents.*

---

The following R packages and dependencies (including version numbers) are required for replicating all figures and tables in the paper.

tidyverse (1.3.0)
xtable (1.8-4)
MASS (7.3-53)
Amelia (1.7.6)
mice (3.11.0)
ggpubr (0.4.0)
reshape2 (1.4.4)

Replication code run using R version 4.0.3.

Additional R packages for simulation:

mi (1.0)
mvnmle (0.1-11.1)
norm (1.0-9.5)
nnet (7.3-12)
arm (1.10-1)
dplyr (0.8.4)
readr (1.3.1)
purrr (0.3.3)
betareg (3.1-2)
norm2 (2.0.2)
doParallel (1.0.15)
ggplot2 (3.2.1)
haven (2.3.1)
foreign (0.8-75)

N.B. Simulation code run using R version 3.6.2 and Amelia version 1.7.5.

The following Python packages and dependencies are required (including version numbers).

Python (3.7.3)
Numpy (1.18.1)
Pandas (0.24.2)
Tensorflow (1.14.0)
Matplotlib (3.1.0)
Sklearn (0.21.2)

---

Approximate replication script using a standard laptop (dual-core Intel Core i5-7360U @ 2.3GHz, 8.00GB RAM, macOS 10.15.5):

* main_replication.R -- 3.5 minutes

*For individual simulation runtimes, please see the replication guide. Estimated total time: 461 hours of runtime.*

